Gradient Algorithms for Exploration/Exploitation Trade-Offs: Global and Local Variants

نویسندگان

  • Michel Tokic
  • Günther Palm
چکیده

Gradient-following algorithms are deployed for efficient adaptation of exploration parameters in temporal-difference learning with discrete action spaces. Global and local variants are evaluated in discrete and continuous state spaces. The global variant is memory efficient in terms of requiring exploratory data only for starting states. In contrast, the local variant requires exploratory data for each state of the state space, but produces exploratory behavior only in states with improvement potential. Our results suggest that gradient-based exploration can be efficiently used in combination with offand on-policy algorithms such as Q-learning and Sarsa.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Defining Pathways and Trade-offs Toward Universal Health Coverage; Comment on “Ethical Perspective: Five Unacceptable Trade-offs on the Path to Universal Health Coverage”

The World Health Organization’s (WHO’s) World Health Report 2010, “Health systems financing, the path to universal coverage,” promoted universal health coverage (UHC) as an aspirational objective for country health systems. Yet, in addition to the dimensions of services and coverage, distribution of coverage in the population, and financial risk protection highlighted by the report, the conside...

متن کامل

Ethical Perspective: Five Unacceptable Trade-offs on the Path to Universal Health Coverage

This article discusses what ethicists have called “unacceptable trade-offs” in health policy choices related to universal health coverage (UHC). Since the fiscal space is constrained, trade-offs need to be made. But some trade-offs are unacceptable on the path to universal coverage. Unacceptable choices include, among other examples from low-income countries, to expand coverage for services wit...

متن کامل

Augmented Downhill Simplex a Modified Heuristic Optimization Method

Augmented Downhill Simplex Method (ADSM) is introduced here, that is a heuristic combination of Downhill Simplex Method (DSM) with Random Search algorithm. In fact, DSM is an interpretable nonlinear local optimization method. However, it is a local exploitation algorithm; so, it can be trapped in a local minimum. In contrast, random search is a global exploration, but less efficient. Here, rand...

متن کامل

Universal Health Coverage – The Critical Importance of Global Solidarity and Good Governance; Comment on “Ethical Perspective: Five Unacceptable Trade-offs on the Path to Universal Health Coverage”

This article provides a commentary to Ole Norheim’ s editorial entitled “Ethical perspective: Five unacceptable trade-offs on the path to universal health coverage.” It reinforces its message that an inclusive, participatory process is essential for ethical decision-making and underlines the crucial importance of good governance in setting fair priorities in healthcare. Solidarity on both natio...

متن کامل

Using Confidence Bounds for Exploitation-Exploration Trade-offs

We show how a standard tool from statistics — namely confidence bounds — can be used to elegantly deal with situations which exhibit an exploitation-exploration trade-off. Our technique for designing and analyzing algorithms for such situations is general and can be applied when an algorithm has to make exploitation-versus-exploration decisions based on uncertain information provided by a rando...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012